Optimal automatic speech recognition (ASR) takes place when the recognition system is tested under\ncircumstances identical to those in which it was trained. However, in the actual real world, there exist many sources\nof mismatches between the environment of training and the environment of testing. These sources can be due to\nthe sources of noise that exist in real environments. Speech enhancement techniques have been developed to\nprovide ASR systems with the robustness against the sources of noise. In this work, a method based on histogram\nequalization (HEQ) was proposed to compensate for the nonlinear distortions in speech representation. This\napproach utilizes stereo simultaneous recordings for clean speech and its corresponding noisy speech to compute\nstereo Gaussian mixture model (GMM). The stereo GMM is used to compute the cumulative density function (CDF)\nfor both clean speech and noisy speech using a sigmoid function instead of using the order statistics that is used\nin other HEQ-based methods. In the implementation, we show two choices to apply HEQ, hard decision HEQ and\nsoft decision HEQ. The latter is based on minimum mean square error (MMSE) clean speech estimation. The\nexperimental work shows that the soft HEQ and hard HEQ achieve better recognition results than the other HEQ\napproaches such as tabular HEQ, quantile HEQ and polynomial fit HEQ. It also shows that soft HEQ achieves notably\nbetter recognition results than hard HEQ. The results of the experimental work also show that using HEQ improves\nthe efficiency of other speech enhancement techniques such as stereo piece-wise linear compensation for\nenvironment (SPLICE) and vector Taylor series (VTS). The results also show that using HEQ in multi style training\n(MST) significantly improves the ASR system performance.
Loading....